Goto

Collaborating Authors

 data wrangler


PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

Kim, Jaeyoung, Lee, Sihyeon, Jeon, Hyeon, Lee, Keon-Joo, Bae, Hee-Joon, Kim, Bohyoung, Seo, Jinwook

arXiv.org Artificial Intelligence

Acute stroke demands prompt diagnosis and treatment to achieve optimal patient outcomes. However, the intricate and irregular nature of clinical data associated with acute stroke, particularly blood pressure (BP) measurements, presents substantial obstacles to effective visual analytics and decision-making. Through a year-long collaboration with experienced neurologists, we developed PhenoFlow, a visual analytics system that leverages the collaboration between human and Large Language Models (LLMs) to analyze the extensive and complex data of acute ischemic stroke patients. PhenoFlow pioneers an innovative workflow, where the LLM serves as a data wrangler while neurologists explore and supervise the output using visualizations and natural language interactions. This approach enables neurologists to focus more on decision-making with reduced cognitive load. To protect sensitive patient information, PhenoFlow only utilizes metadata to make inferences and synthesize executable codes, without accessing raw patient data. This ensures that the results are both reproducible and interpretable while maintaining patient privacy. The system incorporates a slice-and-wrap design that employs temporal folding to create an overlaid circular visualization. Combined with a linear bar graph, this design aids in exploring meaningful patterns within irregularly measured BP data. Through case studies, PhenoFlow has demonstrated its capability to support iterative analysis of extensive clinical datasets, reducing cognitive load and enabling neurologists to make well-informed decisions. Grounded in long-term collaboration with domain experts, our research demonstrates the potential of utilizing LLMs to tackle current challenges in data-driven clinical decision-making for acute ischemic stroke patients.


Become an AWS SageMaker Machine Learning Engineer in 30 Days - Development

#artificialintelligence

Section 4 (Days 11 – 18): we will learn: (1) machine learning regression fundamentals including simple/multiple linear regression and least sum of squares, (2) build our first simple linear regression model in Scikit-Learn, (3) list all available built-in algorithms in SageMaker, (4) build, train, test and deploy a machine learning regression model using SageMaker Linear Learner algorithm, (5) list machine learning regression algorithms KPIs such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Percentage Error (MPE), Coefficient of Determination (R2), and adjusted R2, (6) Launch a training job using the AWS Management Console and deploy an endpoint without writing any code, (7) cover the theory and intuition behind XG-Boost algorithm and how to use it to solve regression type problems in Scikit-Learn and using SageMaker Built-in algorithms, (8) learn how to train an XG-boost algorithm in SageMaker using AWS JumpStart, assess trained ...


Refit trained parameters on large datasets using Amazon SageMaker Data Wrangler

#artificialintelligence

Amazon SageMaker Data Wrangler helps you understand, aggregate, transform, and prepare data for machine learning (ML) from a single visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. Data science practitioners generate, observe, and process data to solve business problems where they need to transform and extract features from datasets. Transforms such as ordinal encoding or one-hot encoding learn encodings on your dataset. These encoded outputs are referred as trained parameters.

  Country:
  Industry: Retail > Online (0.40)

Integrate Amazon SageMaker Data Wrangler with MLOps workflows

#artificialintelligence

As enterprises move from running ad hoc machine learning (ML) models to using AI/ML to transform their business at scale, the adoption of ML Operations (MLOps) becomes inevitable. As shown in the following figure, the ML lifecycle begins with framing a business problem as an ML use case followed by a series of phases, including data preparation, feature engineering, model building, deployment, continuous monitoring, and retraining. For many enterprises, a lot of these steps are still manual and loosely integrated with each other. Therefore, it's important to automate the end-to-end ML lifecycle, which enables frequent experiments to drive better business outcomes. Data preparation is one of the crucial steps in this lifecycle, because the ML model's accuracy depends on the quality of the training dataset.

  Genre: Workflow (1.00)
  Industry: Retail > Online (0.40)

An Introduction to Amazon SageMaker

#artificialintelligence

Amazon SageMaker helps data scientists and inventors to prepare, make, train, and deploy high- quality machine learning models by bringing together a broad set of capabilities purpose- erected for machine learning. Amazon SageMaker make available a set of solutions for the most common use cases that may be deployed readily with just a few clicks to make it easier to grow started. Amazon SageMaker is a completely accomplished machine learning service. Data scientists and developers may speedily and easily build and train machine learning models with SageMaker. They can straight deploy them into a production-ready hosted environment.


Prepare data faster with PySpark and Altair code snippets in Amazon SageMaker Data Wrangler

#artificialintelligence

Amazon SageMaker Data Wrangler is a purpose-built data aggregation and preparation tool for machine learning (ML). It allows you to use a visual interface to access data and perform exploratory data analysis (EDA) and feature engineering. The EDA feature comes with built-in data analysis capabilities for charts (such as scatter plot or histogram) and time-saving model analysis capabilities such as feature importance, target leakage, and model explainability. The feature engineering capability has over 300 built-in transforms and can perform custom transformations using either Python, PySpark, or Spark SQL runtime. For custom visualizations and transforms, Data Wrangler now provides example code snippets for common types of visualizations and transforms.


Automate ML Development With Amazon Sagemaker - Analytics Vidhya

#artificialintelligence

This article was published as a part of the Data Science Blogathon. Amazon Sagemaker is arguably the most powerful, feature-rich, and fully managed machine learning service developed by Amazon. From creating your own labeled datasets to deploying and monitoring the models on production, Sagemaker is equipped to do everything. It can also provide an integrated Jupyter notebook instance for easy access to your data for exploration and analysis, so you don't have to fiddle around with server configuration. Sagemaker supports bring-your-own-algorithms and frameworks, which offer flexible distributed training options that adjust to your specific workflows.


Amazon SageMaker Autopilot now supports time series data

#artificialintelligence

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning (ML) models based on your data, while allowing you to maintain full control and visibility. We have recently announced support for time series data in Autopilot. You can use Autopilot to tackle regression and classification tasks on time series data, or sequence data in general. Time series data is a special type of sequence data where data points are collected at even time intervals. Manually preparing the data, selecting the right ML model, and optimizing its parameters is a complex task, even for an expert practitioner.


Save costs by automatically shutting down idle resources within Amazon SageMaker Studio

#artificialintelligence

Amazon SageMaker Studio provides a unified, web-based visual interface where you can perform all machine learning (ML) development steps, making data science teams up to 10 times more productive. Studio gives you complete access, control, and visibility into each step required to build, train, and deploy models. Studio notebooks are collaborative notebooks that you can launch quickly because you don't need to set up compute instances and file storage beforehand. Amazon SageMaker is a fully managed service that offers capabilities that abstract the heavy lifting of infrastructure management and provides the agility and scalability you desire for large-scale ML activities with different features and a pay-as-you-use pricing model. In Studio, running notebooks are containerized separately from the JupyterServer UI in order to de-couple compute infrastructure sizing.


Prepare data from Snowflake for machine learning with Amazon SageMaker Data Wrangler

#artificialintelligence

Data preparation remains a major challenge in the machine learning (ML) space. Data scientists and engineers need to write queries and code to get data from source data stores, and then write the queries to transform this data, to create features to be used in model development and training. All of this data pipeline development work doesn't really focus on the building of ML models, but focuses on the building of data pipelines necessary to make the data available to the models. Amazon SageMaker Data Wrangler makes it easier for data scientists and engineers to prepare data in the early phase of developing ML applications by using a visual interface. Data Wrangler comes with over 300 built-in data transformations to help normalize, transform, and combine features without writing any code. You can now use Snowflake as a data source in Data Wrangler to easily prepare data in Snowflake for ML.